Goto

Collaborating Authors

 Cross Validation





A Honest Cross-Validation Estimator for Prediction Performance

Pan, Tianyu, Yu, Vincent Z., Devanarayan, Viswanath, Tian, Lu

arXiv.org Machine Learning

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model performance on the test set, and averages the model performance across different data splits. A well-known criticism is that such cross-validation procedure does not directly estimate the performance of the particular model recommended for future use. In this paper, we propose a new method to estimate the performance of a model trained on a specific (random) training set. A naive estimator can be obtained by applying the model to a disjoint testing set. Surprisingly, cross-validation estimators computed from other random splits can be used to improve this naive estimator within a random-effects model framework. We develop two estimators -- a hierarchical Bayesian estimator and an empirical Bayes estimator -- that perform similarly to or better than both the conventional cross-validation estimator and the naive single-split estimator. Simulations and a real-data example demonstrate the superior performance of the proposed method.


Region-of-Interest Augmentation for Mammography Classification under Patient-Level Cross-Validation

Bigdeli, Farbod, Mohammadagha, Mohsen, Bigdeli, Ali

arXiv.org Artificial Intelligence

Breast cancer screening with mammography remains central to early detection and mortality reduction. Deep learning has shown strong potential for automating mammogram interpretation, yet limited-resolution datasets and small sample sizes continue to restrict performance. We revisit the Mini-DDSM dataset (9,684 images; 2,414 patients) and introduce a lightweight region-of-interest (ROI) augmentation strategy. During training, full images are probabilistically replaced with random ROI crops sampled from a precomputed, label-free bounding-box bank, with optional jitter to increase variability. We evaluate under strict patient-level cross-validation and report ROC-AUC, PR-AUC, and training-time efficiency metrics (throughput and GPU memory). Because ROI augmentation is training-only, inference-time cost remains unchanged. On Mini-DDSM, ROI augmentation (best: p_roi = 0.10, alpha = 0.10) yields modest average ROC-AUC gains, with performance varying across folds; PR-AUC is flat to slightly lower. These results demonstrate that simple, data-centric ROI strategies can enhance mammography classification in constrained settings without requiring additional labels or architectural modifications.


Technical note on Fisher Information for Robust Federated Cross-Validation

Khan, Behraj, Syed, Tahir Qasim

arXiv.org Machine Learning

When training data are fragmented across batches or federated-learned across different geographic locations, trained models manifest performance degradation. That degradation partly owes to covariate shift induced by data having been fragmented across time and space and producing dissimilar empirical training distributions. Each fragment's distribution is slightly different to a hypothetical unfragmented training distribution of covariates, and to the single validation distribution. To address this problem, we propose Fisher Information for Robust fEderated validation (\textbf{FIRE}). This method accumulates fragmentation-induced covariate shift divergences from the global training distribution via an approximate Fisher information. That term, which we prove to be a more computationally-tractable estimate, is then used as a per-fragment loss penalty, enabling scalable distribution alignment. FIRE outperforms importance weighting benchmarks by $5.1\%$ at maximum and federated learning (FL) benchmarks by up to $5.3\%$ on shifted validation sets.



Regularization Path of Cross-Validation Error Lower Bounds

Atsushi Shibagaki, Yoshiki Suzuki, Masayuki Karasuyama, Ichiro Takeuchi

Neural Information Processing Systems

Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances. Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how many grid-points would be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV error. In this paper we propose a novel framework for computing a lower bound of the CV errors as a function of the regularization parameter, which we call regularization path of CV error lower bounds . The proposed framework can be used for providing a theoretical approximation guarantee on a set of solutions in the sense that how far the CV error of the current best solution could be away from best possible CV error in the entire range of the regularization parameters. Our numerical experiments demonstrate that a theoretically guaranteed choice of a regularization parameter in the above sense is possible with reasonable computational costs.


Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus

Lamrani, Lamia, Collins, Benoît, Bouchaud, Jean-Philippe

arXiv.org Machine Learning

Cross-validation is one of the most widely used methods for model selection and evaluation; its efficiency for large covariance matrix estimation appears robust in practice, but little is known about the theoretical behavior of its error. In this paper, we derive the expected Frobenius error of the holdout method, a particular cross-validation procedure that involves a single train and test split, for a generic rotationally invariant multiplicative noise model, therefore extending previous results to non-Gaussian data distributions. Our approach involves using the Weingarten calculus and the Ledoit-Péché formula to derive the oracle eigenvalues in the high-dimensional limit. When the population covariance matrix follows an inverse Wishart distribution, we approximate the expected holdout error, first with a linear shrinkage, then with a quadratic shrinkage to approximate the oracle eigenvalues. Under the linear approximation, we find that the optimal train-test split ratio is proportional to the square root of the matrix dimension. Then we compute Monte Carlo simulations of the holdout error for different distributions of the norm of the noise, such as the Gaussian, Student, and Laplace distributions and observe that the quadratic approximation yields a substantial improvement, especially around the optimal train-test split ratio. We also observe that a higher fourth-order moment of the Euclidean norm of the noise vector sharpens the holdout error curve near the optimal split and lowers the ideal train-test ratio, making the choice of the train-test ratio more important when performing the holdout method.


Embodied Intelligence in Disassembly: Multimodal Perception Cross-validation and Continual Learning in Neuro-Symbolic TAMP

He, Ziwen, Wang, Zhigang, Peng, Yanlong, Chang, Pengxu, Yang, Hong, Chen, Ming

arXiv.org Artificial Intelligence

Abstract-- With the rapid development of the new energy vehicle industry, the efficient disassembly and recycling of power batteries have become a critical challenge for the circular economy. In current unstructured disassembly scenarios, the dynamic nature of the environment severely limits the robustness of robotic perception, posing a significant barrier to autonomous disassembly in industrial applications. This paper proposes a continual learning framework based on Neuro-Symbolic task and motion planning (T AMP) to enhance the adaptability of embodied intelligence systems in dynamic environments. Our approach integrates a multimodal perception cross-validation mechanism into a bidirectional reasoning flow: the forward working flow dynamically refines and optimizes action strategies, while the backward learning flow autonomously collects effective data from historical task executions to facilitate continual system learning, enabling self-optimization. Experimental results show that the proposed framework improves the task success rate in dynamic disassembly scenarios from 81.68% to 100%, while reducing the average number of perception misjudgments from 3.389 to 1.128. This research provides a new paradigm for enhancing the robustness and adaptability of embodied intelligence in complex industrial environments. I. INTRODUCTION With the rapid development of Industry 4.0 and the circular economy, industrial disassembly has become a critical link in intelligent manufacturing and resource recycling, facing unprecedented technical challenges [1], [2].